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Background: The Anteromedial Reach Test is a performance-based outcome measure for 
evaluating dynamic knee stabihty in patients with anterior cruciate ligament injury. No previously 
published study has adequately evaluated intrarater or interrater reliability of the Anteromedial 
Reach Test, so the purpose of this study was to assess these measurement properties in healthy 
participants prior to their investigation in patients with anterior cruciate ligament injury. 
Methods: Two raters (A and B) tested 39 healthy university staff and students (20 men, 
19 women). For the intrarater reliability investigation, rater A tested participants on three separate 
test occasions (days 1,2, and 3) at the same time of day. For the interrater reliability investigation, 
raters A and B independently tested participants on the same test occasion (day 3). 
Results: There was no significant systematic bias between test occasions or raters. Values of the 
intraclass correlation coefficient (2,1) were 0.96 for intrarater reliability of both the dominant leg 
and nondominant leg and 0.97 (dominant leg) and 0.98 (nondominant leg) for interrater reliability. 
Values for the standard error of measurement were 1 .46 (dominant leg) and 1 .62 (nondominant 
leg) for the intrarater investigation, and 1 .26 (dominant leg) and 1 .04 (nondominant leg) for the 
interrater investigation. At the 90% confidence level, the minimum detectable change was 3.8% 
and the error in an individual's score at a given point in time was +2.7%. 
Conclusion: The Anteromedial Reach Test demonstrated excellent intrarater and interrater reli- 
ability in healthy participants. This provides a basis for fiiture investigation of the measurement 
properties of the Anteromedial Reach Test in patients with anterior cruciate ligament injury. 
Keywords: anterior cruciate ligament, injury, dynamic stability, rehabilitation, outcome 
measures 

Introduction 

The anterior cruciate ligament (ACL) is one of the most frequently injured ligaments 
in the knee,' with an estimated incidence in the UK of approximately 20,000 injuries 
annually.^ Most are noncontact injuries and occur during sports involving decelera- 
tion, pivoting, cutting, or jumping, such as soccer and basketball.' " One of the most 
common mechanisms of ACL injury is dynamic lower extremity valgus (DLEV), in 
which the knee is abducted, externally rotated, and partially flexed^"^ (Figure 1). 

Following ACL injury, functional instability (ie, giving way or perceived instability 
of the knee)' is a commonly experienced and disabling symptom,'" occurring during 
dynamic postures involving leg rotation, particularly DLEV" '^ Surgical reconstruction 
is recommended for patients who experience repeated instability despite rehabilita- 
tion, or those deemed to be at risk of instability due to work, sport, or recreational 
requirements." Approximately 2,000 ACL reconstructions are performed aimually in 
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Figure I Dynamic lower extremity valgus. 

the English National Health Service,'* at an average cost of 
£1,500 per patient.'^ 

Whether treated surgically or conservatively, several 
months of rehabilitation are usually required following ACL 
injury."" Rehabilitation focuses on improving neuromuscular 
function to enable knee stabilization during dynamic activi- 
ties, particularly those involving DLEy " with progress moni- 
tored using patient-reported questionnaires, eg, the Lysholm 
score,'** or performance-based measures of physical function, 
eg, isokinetic strength" and hop testing.^" Given that the 
goal of rehabilitation is often to return patients to athletic 
activity, it is proposed that performance-based measures 
are particularly important.^' However, many performance- 
based measures have been criticized for testing the ability to 
produce force in the sagittal plane, rather than the ability to 
stabilize the knee during functional, multiplanar motions such 
as DLEV^^'^^ To address this criticism, a new ACL-specific, 
performance-based measure, the Anteromedial Reach Test 
(ART), was created. ^"'^^ Simple and inexpensive, the ART 
requires patients to stand on one leg while reaching as far as 
possible with the other leg in an anteromedial direction. It 
aims to test an individual's ability to dynamically stabilize 
the knee during DLEV 

Although similar to the anteromedial component of the 
Star Excursion Balance Test (SEBTam),^' the ART is designed 
to maximize knee involvement. During the SEBTam, partici- 
pants are allowed to lean backwards and contact the floor 



with the toes of the reaching foot.^' By utilizing this tactic, 
a knee-injured patient could achieve a maximal reach distance 
whilst minimizing motion at the knee (Figure 2). In contrast, 
the ART does not permit participants to lean backwards and 
requires that they contact the ART board with the heel of the 
reaching foot.^^ Therefore, greater knee motion (and possibly 
muscular activity should be required to achieve a maximal 
distance on the ART (Figure 3). This might explain why, in a 
preliminary investigation of the measurement properties of 
the ART, significant differences were detected between the 
injured and uninjured legs of 30 ACL-deficient patients,^" 
providing evidence of known-groups validity.^' Conversely, 
in a study by Herrington et al, the SEBTam did not detect 
significant differences between the injured and uninjured 
legs of 25 ACL-deficient patients, nor did it detect significant 
differences between these patients and a group of matched 
healthy controls.' 

To be useful in clinical practice, the ART must dem- 
onstrate adequate measurement properties, including 
reliability, validity and responsiveness.^' Reliability is 
often investigated first, being a prerequisite for the other 
properties.^" In addition, it is recommended that initial 
studies of a new measure are conducted with healthy 
volunteers rather than patients, to exclude any variability 




Figure 2 Participant performing anteromedial component of Star Excursion Balance 
Test on right leg, while leaning backwards and plantar flexing reaching foot. Right 
knee flexed to approximately 40°. 
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Figure 3 Participant performing Anteromedial Reach Test on right leg. Right knee 
flexed to approximately 60°. 

due to fluctuation in symptoms.^' For performance-based 
measures, where measurements are taken by a rater, both 
intrarater and interrater reliability are important.'- Intrarater 
reliability is the extent to which measurements taken by 
the same rater are consistent, while interrater reliability 
is the extent to which measurements taken by different 
raters are similar.^' 

In the aforementioned preliminary investigation of the 
measurement properties of the ART, intrarater reliability 
was found to be excellent in healthy volunteers (intraclass 
correlation coefficient [ICC] 0.96);^" however, the authors 
neglected to normalize reach distances for leg length, inter- 
rater reliability was not evaluated, and the report lacked suf- 
ficient detail because it was only published in the form of a 
short conference abstract. Therefore, no previously published 
study has adequately evaluated intrarater or interrater reliabil- 
ity of the ART, and so these properties require investigation. 
Also, the possibility of sex-related and bilateral differences 
in reliability should be considered. For example, significant 
fluctuations in neuromuscular fiinction can occur throughout 
the female menstrual cycle," which could increase variability 
between repeated tests. Additionally, differences in reliability 
between the dominant leg and nondominant leg have been 
demonstrated for some performance-based measures (eg, 
jump testing). Accordingly, the primary objective of this 
study was to evaluate intrarater and interrater reliability of 
the ART in healthy participants. Secondary objectives were 



to evaluate reliability for each sex (men and women) and leg 
(dominant and nondominant). 

Materials and methods 

Study design 

A repeated-measures design was used. Intrarater reliability 
was evaluated by comparing ART scores taken by the same 
rater (rater A) on three separate test occasions (labeled days 1 , 
2, and 3), a minimum of 2 days and a maximum of 7 days 
apart.'" Three test occasions were employed to allow for the 
possibility of a learning effect between days 1 and 2. In this 
event, day 1 would be considered a familiarization day, with 
reliability calculated using data from days 2 and 3 only.^' 
Interrater reliability was evaluated by comparing ART scores 
taken by two different raters (raters A and B) on the same test 
occasion (day 3). This took place on the final test occasion, 
so that interrater reliability could be analyzed independent 
of any learning effect.^^ 

Participants 

A power calculation determined that 19 participants of each 
sex were required for a reliability analysis involving two time 
points or raters, to distinguish p„=0.7 from p^=0.9 at a^O.05 
and [3=0.2.''' Allowing for a 1 0% dropout rate,^' 42 volunteer 
healthy staff and students were recruited from one depart- 
ment in a university in the UK. Participants provided writ- 
ten informed consent and the study was approved by the 
Nursing and Physiotherapy Ethics Panel (School of Health 
and Population Sciences, University of Birmingham). Two 
women and one man subsequently withdrew from the study 
before its completion, due to scheduling difficulties (n=2) and 
illness (n=l), leaving 39 participants (20 men, 19 women). 
Participant characteristics are shown in Table 1 . 

Participants were excluded if reporting a history of injury 
or surgery to the legs or lumbar spine, or any balance, neu- 
rologic, or uncorrected vision disorders, since these could 
affect neuromuscular performance. Those aged over 
45 years were also excluded because knee proprioception 
declines with increasing age and incidence of osteoarthritis.'* 



Table I Mean (+ SD) participant characteristics 





Men 


Women 


Total sample 




(n=20) 


(n=l9) 


(n=39) 


Age (years) 


24.7±4.6 


23.S±4.3 


24.1 ±4.4 


Height (m) 


l.8±0.l 


l.7±0.l 


1 .7±0. 1 


Weight (kg) 


77.S±9.6 


60.2±7.7 


69.0±I2.3 


BMI (kg/m^) 


23.9±2.S 


2I.9±2.4 


22.9±2.6 



Abbreviations: SD, standard deviation; BMI, body mass index. 
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To facilitate generalizability of findings and comparison with 
other studies, participant activity levels were recorded using 
the Marx Activity Rating Scale, which records the frequency 
of participation in athletic activity involving running, pivot- 
ing, cutting, and deceleration.^' A mean (+ standard devia- 
tion) score of 9.7+3.9 was obtained, which is approximately 
equivalent to playing a sport involving all of these activities 
once a week (scores eight points), in addition to jogging three 
or more times a week (scores an additional two points). 

Raters 

The ART is intended for clinical use by practitioners with 
varying levels of expertise at using the measure. We therefore 
selected a rater who had used the ART previously (rater A, 
a physiotherapy lecturer with 6 years of clinical experience 
and 2 months of ART experience) and a rater who had no 
previous experience with the ART (rater B, a physiotherapy 
lecturer with 7 years of clinical experience).^* Neither rater 
had previous experience of using the Star Excursion Balance 
Test (SEBT). Rater B was familiarized with the ART during 
a single 30-minute session, prior to commencing the study. 
Rater A explained and demonstrated the ART procedure to 
rater B, using a standardized set of instructions. Rater B then 
practiced administering one bout of the ART, with rater A 
acting as the participant. 

Procedure 

The procedure is shown in Figure 4. Participants attended 
three test occasions (days 1,2, and 3), barefooted and wearing 
shorts and a t-shirt. The mean (+ standard deviation) interval 
between days 1 and 2 was 3.9+1.9 days, with 4.8+2.0 days 



between days 2 and 3. All three test occasions took place 
at the same time of day.'' To avoid impairment of neuro- 
muscular function, participants were requested to attempt 
their normal amount of sleep on nights prior to testing, and 
avoid vigorous exercise for 24 hours before testing, alcohol 
or caffeine consumption on the days of testing, and the con- 
sumption of food or beverages, other than water, for 2 hours 
before testing. 

Day I 

Testing was administered by rater A. Leg length was measured 
in the supine position with a standard tape measure, from the 
anterior superior iliac spine to the distal point of the medial 
malleolus. The dominant leg was determined to be the leg 
with which participants would choose to kick a ball.^^ For 
this, and all subsequent testing, the ART procedure was first 
explained and demonstrated by the rater, using a standardized 
set of instructions. Next, participants performed one bout of 
the ART as previously recommended (eight practice trials, 
followed by five recorded trials, on each leg).^' There was 
15 seconds between trials for data collection and 5 minutes 
between legs to avoid fatigue.^' Leg testing was preassigned 
according to a counterbalanced, randomized ordering across 
consecutive participants. Participants maintained their 
assigned order throughout the study.^' 

Day 2 

Testing was administered by rater A, who was blind to 
previous findings. Participants performed one bout of the 
ART (eight practice trials, followed by five recorded trials, 
on each leg). 



2-7 days 



Day 1 

(rater A) 

Consent obtained 
Participant information 
collected 

Leg length measured 
Marx activity scale 
completed 

1 ART bout performed 



Day 2 
(rater A) 

1 ART bout performed 



2-7 days 



Intrarater reliability 



Day 3 

(raters A and B) 

1 ART bout performed 
with rater A or B 
5 minutes rest 
1 ART bout performed 
with other rater 



Interrater reliability 
< > 



Figure 4 Schematic diagram of study design. 
Abbreviation: ART, Anteromedial Reach Test. 
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Day 3 

Participants performed two bouts of the ART, with one bout 
administered by rater A (who was blind to previous findings) 
and one by rater B. Rater testing was preassigned according 
to a counterbalanced randomized ordering across consecutive 
participants. Raters were blind to each other's findings. The 
first bout of the ART comprised eight practice trials, followed 
by five recorded trials, on each leg. After 5 minutes of rest, 
participants performed the second bout of the ART (one 
practice trial, followed by five recorded trials, on each leg). 
The additional practice trial was to prevent a performance 
decrease resulting from the rest period.^'' 

ART procedure 

The ART procedure has been described previously.^' 
Participants stood on the plastic ART board (Figure 5), 
measuring 150 cm x 90 cm. This was marked with four lines 
(left oblique, right oblique, transverse, sagittal), intersecting 
at a common origin. Strips of 2.5 cm wide semitransparent 
masking tape were placed over both oblique lines, which 
were oriented at 45° angles to the transverse line. New tape 
was applied for each bout of the ART. 

Stance foot positions are shown in Figure 5. Participants 
were asked to reach as far as possible with the contralateral 
leg, along the corresponding oblique line, make a single, light 
touch-down onto the tape with the reaching heel (Figure 6), 
and return to the double-legged starting position, without 
moving the stance foot or placing substantive weight through 
the reach leg, as judged by the rater. This latter requirement 
was judged to have not been met if the reaching heel contacted 
the ART board in a heavy, uncontrolled manner, or if body 
weight was transferred forward onto the reaching heel after 
making contact with the board. Participants were also required 




Figure 5 ART board. Foot outlines (R and L) illustrate foot positions for testing 
right and left legs respectively, but do not appear on ART board. 
Abbreviations: O, origin: RO, right oblique line; LO, left oblique line; T, transverse 
line; S, sagittal line; ART, Anteromedial Reach Test. 




Figure 6 Anteromedial Reach Test performed on right leg. 

to keep their hands on their hips, not lean backwards, and hold 
the knee and ankle of the reach leg in maximum extension and 
dorsiflexion, respectively. If these criteria were not met for 
any recorded trial, the trial was discounted and repeated. 

Following a successfial touch-down, the participant main- 
tained contact until a ruler was slid to the back of the reaching 
heel. The masking tape was then marked at this point with a 
pencil, and a standard tape measure was used to measure the 
distance from the origin, while the participant looked away. 
The mark was then erased. 

ART score calculation 

Normalized ART scores were calculated as the mean reach 
distance of five recorded trials, divided by leg length and 
multiplied by 100.^' 

Statistical analysis 

Statistical analysis was conducted using PASW Statistics 
(version 18.0.3; IBM, Somers, NY, USA) with the level of 
statistical significance set a priori at 0.05. Separate analyses 
were conducted for each leg (dominant leg and nondominant 
leg) and sex grouping (men, women, both sexes). Data were 
analyzed in four stages as follows. 

Outliers and normality of data 

Since outliers can markedly affect reliability statistics,'"' 
they were excluded using a previously reported method for 
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reliability studies.'" ''^ However, this method for identifying 
outliers assumes that data are normally distributed.''^ Q-Q 
plots were used to check normality of the data*'' in prefer- 
ence to statistical tests of normality because the latter are 
sensitive to outliers even when the underlying distribution 
is normal.''^ There was no evidence that the distribution of 
the data departed from normality. 

The method used for outlier identification and exclusion 
was as follows. A participant's data for two consecutive bouts 
of the ART were excluded from analyses when the difference 
between these two bouts lay outside of a calculated 99% 
acceptance range (mean difference between bouts for the 
group +2.576 standard deviations of the difiFerence)."' "^ This 
is because such a large difference between bouts, lying outside 
the range in which 99% of differences are expected to lie, could 
be expected to result from error in performing or administer- 
ing one of the bouts."^ For the intrarater analysis, acceptance 
ranges were calculated for differences between days 1 and 2, and 
days 2 and 3. For the interrater analysis, these were calculated 
for differences between raters A and B on day 3. 

Whether considering both sexes together, or each sex 
separately, the same outliers were identified. Two male partici- 
pants exceeded acceptance ranges for intrarater analysis of the 
dominant leg, so were excluded from this analysis. One of these 
participants also exceeded the ranges for interrater analyses of 
both legs, so was excluded from the interrater analyses. 

Following the exclusion of outliers, data for use in 
subsequent analyses were tested for normality using the 
Shapiro-Wilk test."" Because there was no evidence of depar- 
ture from normality, parametric statistical tests were used in 
all subsequent analyses."'' 

Systematic bias 

Systematic bias is a trend for all participants' scores to 
improve or worsen between repeated assessments (eg, due 
to a between-session learning effect).^' For the intrarater 
analysis, we used repeated-measures analyses of variance 
to test for systematic bias between days 1, 2, and 3.^' Based 
on the results, data from all three test occasions were used 
in subsequent intrarater analyses, effectively increasing the 
sample size and improving the precision of results.^'-'* For the 
interrater analysis, we used paired f-tests to test for systematic 
bias between raters A and B."' 

Reliability 

For both intrarater and interrater reliability, we calculated an 
ICC (2,1) with absolute agreement'" and a 95% confidence 
interval.'^ The ICC is the most commonly used reliability 



index for continuous data.'^ An ICC aO.7 indicates "good" 
reliability"^ and is reported as sufficient for using a measure 
in research."' An ICC ^0.9 indicates "excellent" reliability"^ 
and is reported as sufficient for making clinical decisions 
regarding individuals."' 

Measurement error 

For both the intrarater and interrater analyses, we estimated the 
standard error of measurement (SEM) as the square root of the 
mean square error term from the analysis of variance produced 
during the ICC calculation.^" A 95% confidence interval for 
the SEM was calculated using the method of Stratford and 
Goldsmith.^' The SEM represents the amount of error associated 
with a measure, expressed in actual units of measurement.^' 
Using the SEM from the intrarater analysis, the minimum 
detectable change at the 90% confidence level (MDC^^) was 
estimated as: SEMxV2x 1.64.^'-^' This is the smallest change 
in an individual's score considered to be a true change and 
not measurement error.^' Additionally, we estimated the error 
in an individual's score at a given point in time at the 90% 
confidence level as: +SEM xl .64.2''2' We used the 90% (rather 
than 95%) confidence level when estimating these values, 
based on the rationale that an individual's score should be 
interpreted more liberally than group scores.^' ""^ An explana- 
tion of the clinical application of these values is provided in 
the Discussion section. 

Results 

Intrarater analysis 

Mean (+ standard deviation) ART scores for days 1, 2, and 3 
are shown in Table 2. Repeated-measures analyses of vari- 
ance indicated no significant systematic bias between testing 
occasions (Table 2). Reliability and measurement error 
statistics are presented in Table 3. ICC values ranged from 



Table 2 Anteromedial Reach Test scores for days I, 2, and 3 
(intrarater analysis) 



Group 


Leg 


n 


ART scores, % (mean 


±SD) 


P-value 








Day 1 


Day 2 


Day 3 




Men 


D 


18 


6I.0±6.S 


6I.3±6.I 


6I.8±6.6 


0.28 




ND 


20 


60.4±7.0 


60.1 ±7.5 


60.7±7.7 


0.40 


Women 


D 


19 


66.4±6.S 


66.7±6.8 


66.7±7.l 


0.69 




ND 


19 


67.1 ±7.2 


67.4±7.4 


67.3±7.S 


0.83 


Total 


D 


37 


63.8±7.0 


64.1 ±6.9 


64.3±7.2 


0.27 




ND 


39 


63.9±7.7 


63.8±8.2 


64.2±8.3 


0.64 



Note: P-value is for repeated -measures analysis of variance of systematic bias 
between days I, 2, and 3. 

Abbreviations: ART, Anteromedial Reach Test; D, dominant leg; ND, nondominant 
leg; n, number of participants included in analysis; SD, standard deviation. 
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Table 3 Intrarater reliability and measurement error statistics 


Group 


Leg 


n 


ICC 


SEM, % 


Error in an 


MDC„„, % 

90* 








/OCO/ 
^73/0 K^l ) 


/OCO/ 


individuars 














scor6| % 




Men 


r\ 
U 


1 n 




1 AR / I "5^ ") 1 
1 .OO y \ .jD—A. I 7) 


±Z.O 


J. 7 




ND 


20 


0.96 (0.92-0.98) 


1.49 (1.22-1.92) 


±2.4 


3.5 


Women 


D 


19 


0.97 (0.93-0.99) 


1.25 (1.02-1.63) 


±2.1 


2.9 




ND 


19 


0.94 (0.89-0.98) 


1.77(1.44-2.30) 


±2.9 


4.1 


Total 


D 


37 


0.96 (0.93-0.98) 


1.46 (1.26-1. 75) 


±2.4 


3.4 




ND 


39 


0.96 (0.93-0.97) 


1.62 (1.40-1.93) 


±2.7 


3.8 



Note: Error in an individual's score estimated at 90% confidence level. 

Abbreviations: D, dominant leg; ND, nondominant leg; n, number of participants included in analysis; ICC, intraclass correlation coefficient; 95% CI, 95% confidence 
interval; SEM, standard error of measurement; MDC^^, minimum detectable change at 90% confidence level. 



0.93 to 0.97, while SEM values ranged from 1.25 to 1.77, 
demonstrating similar reliability and measurement error for 
both legs and sexes. For the total sample, the error associated 
with an individual's score at a given point in time, at the 90% 
confidence level, was +2.7% and the MDC^^ was 3.8%. 

Interrater analysis 

Mean (± standard deviation) ART scores for raters A and B 
are shown in Table 4. There was no significant systematic 
bias between raters (Table 4). Reliability and measurement 
error statistics are presented in Table 5. ICC values ranged 
from 0.97 to 0.99, while SEM values ranged from 0.91 to 
1.32, demonstrating similar reliability and measurement error 
for both legs and sexes. 

Discussion 

The ART demonstrated excellent intrarater and interrater 
reliability in both the dominant leg and nondominant leg 
of healthy men and women. ICC values exceeded 0.9, 
suggesting sufficient reliability for making clinical deci- 
sions regarding individuals."' Such high reliability is not 
uncommon for ACL performance-based measures in healthy 

Table 4 Mean Anteromedial Reach Test scores for raters A and 
B (interrater analysis) 



Group Leg n ART scores, % P-value 

(mean ± SD) 









Rater A 


Rater B 




Men 


D 


19 


6I.3±6.9 


6I.0±7.3 


0.47 




ND 


19 


60.5±7.8 


60.2±7.6 


0.33 


Women 


D 


19 


66.7±7.l 


66.6±6.3 


0.90 




ND 


19 


67.3±7.5 


68.0±6.7 


0.10 


Total 


D 


38 


64.0±7.5 


63.8±7.3 


0.53 




ND 


38 


63.9±8.3 


64.1 ±8.2 


0.50 



Note: P-value is for paired t-test analysis of systematic bias between raters. 
Abbreviations: ART, Anteromedial Reach Test; D, dominant leg; ND, nondominant 
leg; n, number of participants included in analysis; SD, standard deviation. 



volunteers, who often demonstrate greater consistency than 
symptomatic patients.^" For example, intrarater reliability 
values exceeding 0.9 have been reported for hop and 
isokinetic testing in uninjured participants. Although 
interrater reliability of ACL performance-based measures 
has not been widely investigated, ICC values exceeding 
0.9 have been reported for isokinetic testing in healthy 
volunteers." 

Only one previous study has evaluated reliability of 
the ART, and was presented in the form of a short con- 
ference abstract. As with our investigation. Rice et al 
demonstrated excellent intrarater reliability (ICC 0.96) of 
the ART in healthy volunteers;^* however, a flaw of this 
study is that reliability was calculated using reach distances 
that had not been normalized for leg length. Because non- 
normalized ART reach distances are related to leg length,^^ 
they are effectively a surrogate measure of leg length and 
not a true indicator of an individual's ability. As with our 
study, the ART scores of healthy participants should be 
normalized by expressing them as a percentage of reach- 
ing leg length. 

The reliability of a measure similar to the ART, ie, 
the SEBT, has been evaluated in healthy volunteers by 
several authors. ^'•"•'''■^^ Two studies used normalized reach 
distances to calculate reliability and are comparable with 



Table 5 Interrater reliability and measurement error statistics 



Group 


Leg 


n 


ICC (95% CI) 


SEM, % (95% CI) 


Men 


D 


19 


0.97 (0.92-0.99) 


1.32 (1.00-1.95) 




ND 


19 


0.99 (0.97-1.00) 


0.91 (0.69-1.33) 


Women 


D 


19 


0.97 (0.92-0.99) 


1.21 (0.92-1.80) 




ND 


19 


0.98 (0.93-0.99) 


1.09 (0.82-1.61) 


Total 


D 


38 


0.97 (0.95-0.99) 


1.26 (1.02-1.62) 




ND 


38 


0.98 (0.97-0.99) 


1.04 (0.85-1.35) 



Abbreviations: D, dominant leg; ND, nondominant leg; n, number of participants 
included in analysis; ICC, intraclass correlation coefficient; 95% CI, 95% confidence 
interval; SEM, standard error of measurement. 
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our investigation.''^^ Plisky et al found that intrarater 
reliability was good (range 0.85-0.88) for the three reach 
directions evaluated (anterior, posteromedial, posterolateral) 
but did not reach 0.9.'^ Comparison with our investiga- 
tion is limited, given that the SEBTam was not evaluated. 
Additionally, all measurements were taken during one test 
occasion, rather than several days apart, possibly reduc- 
ing variation in performance. More recently, Munro and 
Herrington evaluated the intrarater reliability of all eight 
SEBT reach directions over 2 weeks.'' Reliability reached 0.9 
for three of the eight reach directions (range 0.84-0.92), with 
an ICC of 0.85 for the SEBTam. Additionally, the MDC,„ 
for the SEBTam was 5.1%, which is higher than the 3.8% 
for the ART in our study. 

The interrater reliability of the SEBT was evaluated 
in the aforementioned study by Plisky et al, with ICCs all 
exceeding 0.9 (range 0.99-1.00).^' Comparison with our 
investigation is again difficult, because the different raters 
simultaneously measured the same bout of the SEBT, rather 
than independently testing participants, possibly reducing 
performance variation. A shared finding with our investiga- 
tion is that interrater reliability was superior to intrarater 
reliability. Given that our interrater investigation took place 
over approximately 30 minutes, it is not surprising that par- 
ticipants demonstrated less variability between bouts of the 
ART than for the intrarater investigation, which took place 
over several days. 

Clinical relevance 

Our study contains a number of clinically relevant findings: 

1 . There was no significant learning effect (systematic bias) 
between test occasions; therefore, no additional familiar- 
ization day is required before using the ART. 

2. The error in an individual's score at a given point in time 
was +2.7% at the 90% confidence level. Therefore, if, for 
example, an individual is observed to score 60% on the 
ART, we can be 90% confident that they have scored at 
least 57.3% (ie, 60%-2.7%) and not more than 62.7% 
(ie, 60%+2.7%).2'' 

3. The MDCjj was 3.8%. This is the smallest change in an 
individual's score considered to be true change and not 
measurement error. ^' Therefore, if an individual's ART 
score improves or worsens between repeated tests by 
less than 3.8%, we can be 90% confident that they are 
unchanged.^' It should be noted that this value is for use 
with individuals only. For groups, the MDC^^ should be 
divided by Vn.^" For example, in a group of 40 partici- 
pants, the MDC,„ would be 0.6%. 



4 . The excellent interrater reliability suggests that a clinician 
who has not used the ART before can become proficient 
with the measure following a single familiarization 
session. 

5. The excellent interrater reliability demonstrates that a 
5-minute rest period can be used between the practice 
and recorded trials. Unlike the SEBT,^'' the ART does 
not currently allow such a rest period. Although 
our data do not suggest any physical fatigue, some 
participants indicated that during days 2 and 3, when 
already familiarized with the ART, eight practice tri- 
als felt onerous and fatiguing. We will consider using 
a 5-minute rest period following the practice trials in 
future studies. 

Study limitations 

Our study was designed in accordance with recommenda- 
tions for conducting a reliability study, considering such 
factors as sample size, blinding, representativeness of 
raters, systematic bias, appropriate statistical analysis, 
and clinical relevance.^''"" '**' However, there are two 
main limitations that should be considered when inter- 
preting its results. First, to exclude the effects of motor 
learning, the interrater reliability investigation did not 
take place until day 3. This meant that participants had 
already received instructions from rater A, ie, the more 
experienced rater. Therefore, any variability resulting 
from the initial instructions being given by different rat- 
ers was removed, possibly inflating interrater reliability. 
However, given that the ART is simple to perform and 
instructions were standardized, any such inflation is likely 
to be small. 

Second, although reported previously, outlier removal 
from a reliability study is not common practice and would 
have increased reliability. However, inspection of ART 
scores for the two excluded male participants supports the 
view that their results were anomalous. One excluded par- 
ticipant achieved consistent scores with rater A on days 1 
and 2 and rater B on day 3, but then showed a decrease 
of 7.3% on the dominant leg with rater A on day 3. The 
fact that this participant had already achieved consistency 
over three test occasions with two different raters suggests 
that an error occurred during the final bout of the ART. 
The other excluded participant demonstrated an increase 
of 10.5% on the dominant leg between days 1 and 2. This 
resulted from scoring 14.7% less on the dominant leg than 
on the nondominant leg on day 1 , but then attaining parity 
between legs for all subsequent test occasions. The mean 
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Table 6 Reliability and measurement error statistics with outliers included 


^i*niin 1 (^o 










1 1 1 Lwi i ALCi Hi lAiySIS 






ICC 


SEM, % 


Error in an 




ICC 


SEM, % 




/OCO/ /"|\ 
^73/0 ^1 ) 




individual's 






^73/0 ^1 ) 








score, % 








Men D 


0.89 (0.78-0.95) 


2.42 (1.98-3.12) 


±4.0 


5.6 


0.95 (0.89-0.98) 


1.53 (1.16-2.24) 


ND 










0.98 (0.95-0.99) 


1.17 (0.89-1.71) 


Total D 


0.93 (0.89-0.96) 


1.93 (1.67-2.29) 


±3.1 


4.5 


0.97 (0.94-0.98) 


1.37 (1. 12-1.76) 


ND 










0.98 (0.96-0.99) 


1.14 (0.93-1.34) 



Note: Error in an individual's score estimated at 90% confidence level. 

Abbreviations: D, dominant leg; ND, nondominant leg; ICC, intraclass correlation coefficient; 95% CI, 95% confidence interval; SEM, standard error of measurement; 
MDCgg, minimum detectable change at 90% confidence level. 



between-leg asymmetry for the rest of the sample on day 1 
was 2.5%, suggesting that an error occurred during the first 
test occasion. 

Table 6 shows the results of the six analyses from 
which outliers were originally excluded, with these two 
participants reincluded. The ICCs still exceed 0.9 in all 
but one case (intrarater reliability of the male dominant 
leg is now 0.89), demonstrating that reliability is not sub- 
stantially affected. However, the effect on the measurement 
error statistics is more marked. For the total sample, the 
MDC,j for the dominant leg increases from 3 .4% (Table 3) 
to 4.5% (Table 6), a factor increase of 1.3. For the male 
subgroup, the MDCj^, for the dominant leg increases from 
3.9% (Table 3) to 5.6% (Table 6), a factor increase of 1.4. 
Considering that the MDC^^ for the male nondominant 
leg is 3.5%, this new value of 5.6% (from 3.9%o) for the 
dominant leg seems abnormally high. This supports previ- 
ous findings that just a small number of outliers can sub- 
stantially inflate measurement error statistics.'"' We believe 
that the inclusion of outliers in our analyses would have 
resulted in an unacceptable distortion of clinically mean- 
ingful values such as the MDC^^, justifying the exclusion 
of these participants. 

Conclusion 

The ART demonstrated excellent levels of intrarater and 
interrater reliability in healthy volunteers, with no significant 
between-session learning effect. Reliability and measurement 
error were similar for both sexes (men and women) and legs 
(dominant and nondominant). The MDC^^ was 3.8% and 
the error in an individual's score at a given point in time 
was +2.7%. Now that reliability of the ART has been dem- 
onstrated in healthy volunteers, future studies can investigate 
its measurement properties in ACL-injured patients. 
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